AITopics | main text

Collaborating Authors

main text

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Phase Transitions in Attention: A Bayesian Theory of Copy Head Emergence

Lavie, Itay, Fischer, Kirsten, Lekov, Andrey, Van Maele, Frederic, Ringel, Zohar, Helias, Moritz

arXiv.org Machine LearningJun-11-2026

Attention is the key mechanism underlying in-context learning in transformers, and attention patterns have been observed empirically to emerge abruptly during training. We present a Bayesian theory of feature learning in attention; we then focus on how the copy subcircuit in the first layer of an induction head is learned by analyzing a single-layer softmax attention network trained on a copy task. We derive a closed-form posterior over the attention matrix and reduce it to a low-dimensional order parameter space. This reduction reveals a phase transition in the amount of training data, which we verify using both Bayesian sampling and standard training with Adam. We contrast our results with linear attention and find that softmax attention exhibits a \emph{first-order phase transition} while in linear attention an initial \emph{second-order phase transition} is followed by a smooth, continuous evolution toward the structured attention pattern (\emph{crossover}). Our work provides a first-principles theoretical account of the abrupt emergence of the copy subcircuit, reminiscent of the one observed in training large language models.

machine learning, natural language, transition, (18 more...)

arXiv.org Machine Learning

2606.12058

Country:

North America > United States (0.45)
Europe > Germany (0.28)
Asia > Middle East (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

OTSS: Output-Targeted Soft Segmentation for Contextual Decision-Weight Learning

Hu, Renjun, Ahn, Hyun-Soo

arXiv.org Machine LearningMay-4-2026

Many machine learning systems make constrained decisions by optimizing factorized objectives, but the context-specific objective is often treated as fixed. We study contextual decision-weight learning: from logged decisions and proxy outputs, learn an optimizer-facing weight vector w(x) over interpretable decision factors z(x,d), rather than a direct policy or generic predictive score. We propose OTSS, an output-targeted soft-segmentation model that deploys the personalized decision-ready weight vector. At the function-class level, the theory highlights a hard-versus-soft distinction. Hard partitions incur an approximation-estimation tradeoff under overlap, while a realizable fixed-K soft class removes the hard-partition approximation floor and attains a parametric rate. We evaluate OTSS in controlled benchmarks with finite evaluation libraries, where the true weight vector and downstream regret can be computed exactly. In the representative overlap setting, OTSS attains the lowest mean regret among the comparators, including EM mixture regression, the strongest soft-mixture baseline in our comparison; it matches EM on coefficient recovery while running about two orders of magnitude faster. In a matched K=5 benchmark, OTSS remains competitive under hard-routed truth and improves as heterogeneity becomes softer and sample size grows. On a fixed Complete Journey retail anchor with real household covariates and action geometry, OTSS again achieves the lowest mean-regret point estimate.

artificial intelligence, benchmark, machine learning, (14 more...)

arXiv.org Machine Learning

2605.00193

Genre: Research Report > Experimental Study (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Supplementary for Neural Methods for Point-wise Dependency Estimation

Neural Information Processing SystemsApr-30-2026, 19:48:33 GMT

In this section, we shall show detailed derivations for the point-wise dependency estimation methods. Four approaches are discussed: Variational Bounds of Mutual Information, Density Matching, Probabilistic Classifier, and Density-Ratio Fitting. For convenience, we define Ω = X Y. We have PX,Y and PXPY (can also be written as PX PY) be the probability measures over σ algebras over Ω with their probability densities being the Radon-Nikodym derivatives (i.e., p(x,y) = dPX,Y/dµ and p(x)p(y) = dPXPY/dµwith µbeing the Lebesgue measure). These estimators have the logarithm of point-wise dependency (PMI) as the intermediate product, which we will show in the following. We denote Mbe any class of functions m: Ω R. Proposition 1 (INWJ and its neural estimation, restating Nguyen-Wainwright-Jordan bound [5, 18]).

artificial intelligence, machine learning, objective, (15 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.24)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Add feedback

Learning Functional Transduction: S.I. Contents

Neural Information Processing SystemsApr-30-2026, 04:22:35 GMT

We propose below the proofs of the results presented in the main text. Most of the arguments are adapted from the development proposed in (Zhang, 2013) which goes beyond real or complex-valued RKBS developed in (Zhang et al., 2009; Song et al., 2013) to develop the notion of vector-valued RKBS. In addition, we note that assumptions regarding the properties of the RKBS of interests such as uniform Fréchet differentiability and uniform convexity have been further relaxed in other works (Xu and Ye, 2019; Lin et al., 2022) but are here sufficient for our discussion since they guarantee the unicity of a semi-inner product x.,.yB compatible with the norm ||.||B (Giles, 1967). S.1.1 Theoretical results Theorem 1 Theorem 1 gathers for the sake of compactness the definition of a vector-valued reproducing kernel Banach space with the properties of existence and unicity of the kernel K. Proof. For any v PV and u PU, the mapping OÞÑ xOpvq,uyU is a bounded linear form in LpBq.

artificial intelligence, experiment, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Appendix

Neural Information Processing SystemsApr-26-2026, 04:51:49 GMT

AAbout Equation (1) As we discussed in Section 3, label smoothing and focal loss are equivalent to the standard CE loss with an additional maximum-entropy regularizer (see in Equation (1) and (2) in the main text). The proof of Equation (2) can be found in the corresponding paper [4]. SVHN is an image dataset which consists of 32 32 colored images of 0 9 digits. CIFAR-10 and CIFAR-100 consist of 32 32 colored natural images arranged in 10 and 100 classes, respectively. For 20Newsgroups, we use the GloVe word embedding [7] for text representation before the 1D-CNN model and set the embedding dimension as 100.

artificial intelligence, ece, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

55a7cf9c71f1c9c495413f934dd1a158-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 23:43:36 GMT

artificial intelligence, k-net, kernel update, (13 more...)

Neural Information Processing Systems

Country: Asia > China (0.15)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

532b81fa223a1b1ec74139a5b8151d12-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 22:33:10 GMT

artificial intelligence, machine learning, relative standard deviation, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Supplement to " Uniform Concentration Bounds toward a Unified Framework for Robust Clustering "

Neural Information Processing SystemsApr-25-2026, 16:23:28 GMT

For the theoretical exposition, we first establish the following Lemmas. Lemma A.1 proves that the derivative of the function φis bounded in the `2-norm when the domain is restricted to the support of P. Lemma A.1. Lemma A.3 proves that the function fΘ, as a function of Θ, is Lipschitz with respect to the k k norm. Joint first authors contributed equally Corresponding author 35th Conference on Neural Information Processing Systems (NeurIPS 2021). Thus, from equation (1), h φ(PC(θ)) φ(θ),x PC(θ)i 0. (2) We now observe that, dφ(x,θ) dφ(x,PC(θ)) dφ(PC(θ),θ) = h φ(PC(θ)) φ(θ),x PC(θ)i 0. Hence the result.

artificial intelligence, machine learning, momnl, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Polyhedron Attention Module: Learning Adaptive-order Interactions Anonymous Author(s) Affiliation Address email Appendixes1

Neural Information Processing SystemsApr-25-2026, 15:31:13 GMT

Contents2 ADeriving Eq. 2. 23 BThe hyperplane set generated by the oblique tree is a superset of that created by the4 ReLU-activated plain DNN 35 CProof of Theorem 1 46 DProof of Theorem 2 57 EProof of Theorem 3 68 FProof of Theorem 4 79 GImplementation Detail 810 We consider a L-layer (L 2) ReLU activated plain DNN module f: Rn0 RnL with input12 x Rp. Eq. 2 in the main text can be30 obtained by rewriting P An oblique tree is a binary tree where each node splits the space by a hyperplane rather than by34 thresholding a single feature. The tree starts with the root of the full input space S, and by recursively35 splitting S, the tree grows deeper. For a D-depth (D 3) binary tree, there are 2D 1 1 internal36 nodes and 2D 1 leaf nodes. As shown in Figure 1, each internal and leaf node maintains a sub-space37 representing a polyhedron in S, and each layer of the tree corresponds to a partition of the input38 space into polyhedrons.

activation state, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Controlled object Main model Outputfunk(hm) CB(hm) = hˆLfunk(hs,ds) CF(hs) Inputhmhmhs, dshs

Neural Information Processing SystemsApr-25-2026, 14:24:51 GMT

There are no explicit equations for the cerebellum traditionally also has access to a desired state ds (in particular, one can consider this a and forward DNI, respectively; L denotes the loss function. In addition, the inverse model of the of a motor area and sensory area, respectively; CB,CF denotes the computation of backward DNI Notation is largely consistent with section 2 of the main text: hm,hs denotes the hidden activity properties of the inverse model of the cerebellum can be set against those of forward DNI (red). Controller Neocortex Main model Cerebellum Synthesiser Forward Model Backward DNIInverse Model Forward DNI be summarised in table S1. In general, the likeness in formulation between DNI and the cerebellar internal model hypothesis can backward DNI where the main model is an motor-associated RNN. In fact, it was recently suggested that the cerebellum out that though the temporal case of forward DNI was not originally considered in [5], there remain learns to mimic the forward computations which then take place in the neocortex.

artificial intelligence, caption, machine learning, (17 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.80)

Add feedback